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DECLARATION OF ATTORNEY UNDER 37 C.F.R. §1.131 

I, Marc P. Schuyler, hereby declare as follows. 

1. All statements set forth herein made on my personal knowledge are true, and all 
statements made on information and belief are believed to be true. 

2. I am the attorney who drafted and filed the patent application identified above. 



3. A copy of the attached invention disclosure document (Exhibit A) submitted by the 
inventor, Craig M. Wittenbrink, was received in the Hewlett-Packard Laboratories Legal 
Department (hereinafter "HP Labs") in Palo Alto, California, USA on November 11, 1999. 
The invention disclosure document included the attached document entitled "True 
Transparency with the Fragment Buffer Graphics Architecture" (hereinafter "True 
Transparency Article"), authored by Craig M. Wittenbrink, along with a copy of Craig M. 
Wittenbrink's lab notebook dated between August 31, 1998 and October 8, 1999 (hereinafter 
"Lab Notebook"). 
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4. Exhibit A is a true and correct copy of the invention disclosure document submitted 
to HP Labs by Craig M. Wittenbrink on November 11, 1999. The invention disclosure 
document, the True Transparency Article, and the Lab Notebook are hereby incorporated to 
and form part of the present Declaration in their entireties. 

5. I relied on the information contained in the invention disclosure document, the True 
Transparency Article, and the Lab Notebook in drafting the above-identified patent 
application. As such, the invention claimed in the above-identified patent application 
directly correlates to the information contained in the invention disclosure document, the 
True Transparency Article, and the Lab Notebook, as shown in the examples below. 

a) All of the features of Claim 1 of the present application are disclosed in 
Figures 2-4 and Section 2 (page 3) of the True Transparency Article. For instance, Figure 2 

\ shows the claimed first storage (Z-buffer and Color buffer), the claimed fragment buffer 

which holds multiple fragments for overlapping data (Fragment buffer), one of instructions 
and hardware that causes the a device to perform various functions (disclosure contained in 
Section 2), and the detection of the predetermined one of closest and furthest visible data for 
a pixel location is disclosed with respect to the example depicted in Figure 4. 

b) Figures 2-4 and Section 2 of the True Transparency Article also disclose all of 
the elements of Claims 11, 17, 19 and 22 of the present application. In addition, the 
disclosure contained in Section 5.1 of the True Transparency Article discloses all of the 
elements of Claims 1 1, 17, 19, and 22 of the present application. For instance, page 17 of the 
True Transparency Article refers to Figure 23 as showing a single pixel representing 
overlapping data. Page 17 also recites that rasterized fragments are inserted (or stored) into 
"two buffers to determine the closest opaque fragment and the furthest unoccluded 
transparent fragment." In addition, the pseudo-code shown in Figure 18 considered with the 
definitions provided on page 16 of the True Transparency Article, discusses the remaining 
steps claimed in Claims 11,17, and 19 and the means for successively detecting and blending 
claimed in Claim 22 of the present application. 



2 



PATENT 



Atty Docket No.: 10001077-1 
App. Ser. No.: 09/881,424 



c) Based at least upon these correlations between the claimed invention and the 
disclosure contained in the True Transparency Article, I believe that Craig M. Wittenbrink 
was in complete possession of and conceived of the invention claimed in the present 
application at least on November 11, 1999, the date upon which Craig M. Wittenbrink 
submitted the invention disclosure document to HP Labs. 

6. I worked reasonably hard to draft the above-identified patent application from before 
July 19, 2000 to the date at which the above-identified application was filed, June 14, 2001, 
as evidenced at least by the facts listed below. 

a) During the period between July 19, 2000 and June 14, 2001, I managed the 
patent prosecution department at HP Laboratories in Palo Alto, California, in addition to 
performing other legal duties. In these capacities, approximately half of my time was spent 
supervising and preparing patent applications and in-house prosecution attorneys, and the 
approximate other half of my time was spent preparing, reviewing, and negotiating various 
agreements for HP Labs. At various times, I spent a larger portion (than fifty percent) of my 
time working on the various agreements. 

b) In regard to the supervision and preparation of patent applications during the 
period between July 19, 2000 and June 14, 2001, I oversaw the preparation and filing of 
approximately 100 new patent applications by outside counsel. In addition, during calendar 
years 2000 and 2001, I personally drafted a number of new patent applications (at least 10 
new patent applications), which included the present application. Moreover, I had a 
reasonably long backlog of cases to be reviewed and/or prepared that dated back from 1999 
and 2000, which I worked on in substantially chronological order. 

c) The general practice at HP Labs during the period between July 19, 2000 and 
June 14, 2001, and prior, included the review of received invention disclosure documents on 
a somewhat quarterly basis to determine whether to proceed with preparation and filing of 
patent applications based upon the invention disclosure documents. 
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d) My records indicate that the attached invention disclosure document came up 
for initial review in the March 2000 quarterly patent coordination meeting. However, the 
attached invention disclosure document was not reviewed until the June 2000 patent 
coordination meeting because the March 2000 meeting was, in most likelihood, canceled. In 
any regard, the decision to proceed with preparation and filing of a patent application based 
upon the attached invention disclosure document was made during the June 2000 patent 
coordination meeting. 

e) I docketed this invention disclosure to be worked on in due course, 
considering my backlog of other unrelated cases and my other duties as an attorney manager 
for the legal department at HP Labs. I began preparing the present application in November- 
December 2000, and was actively seeking inventor response to technical questions I had, as 
reflected by an email I sent to Craig M. Wittenbrink on December 21, 2000. I also recall that 
in late December 2000, I drove from San Francisco to Los Angeles, during which time 1 
dictated various parts of the present application, along with other, unrelated patent 
applications I was drafting at that same time. 

f) I continuously worked on preparing the present application while performing 
my other duties following December 2000. I possessed a first rough draft of the patent 
application and discussed it with the inventor, Craig Wittenbrink, in roughly January- 
February 2001, but my draft required some substantial rework. I therefore, prepared a 
second draft of the patent application which I delivered to Craig M. Wittenbrink with 
drawings no later than April 9, 2001. Moreover, a decision to not foreign file the present 
application was made during the May 2001 patent coordination meeting. Following at least 
one further revision based upon Craig M. Wittenbrink's comments, I filed the present 
application with the United States Patent and Trademark Office on June 14, 2001. 
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I, Marc P. Schuyler, acknowledge that willful false statements and the like are punishable by 
fine or imprisonment, or both (18 U.S.C. 1001) and may jeopardize the validity of the 
application or any patent issuing thereof. 

/ /// S7 / / /7 /7 n 




Marc P. Schuyler 
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Instructions: The information contained in this document is COMPANY CONFIDENTIAL and may not be disclosed to others without prior 
authorization. Submit this disclosure to the HP Legal Department as soon as possible. No pa^t protection is possible until a patent application is 
mMz^mp^^m^ submitted to faGoymitmu. ; - CSZ ■ \/ C\, 



Name 



Product Name or Number: 



published, or are you planning to publish?"!? so, the datevs) i 



Was a description of the invention published, or are you planning to publish? If so, the datevs) and publication^): 
Was a produtt including the invention announced, offered for sate, sold, or is such activity proposed? If so, the datefs) and location(s): 
Was the invention disclosed to anyone outside of HP. or will such disclosure occur? if so, the: date(s) arid name(s); 
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Was trie invention described ;in a lab book or other record? if so, please kienttfy (tab book 4, etc.) 



wenfcon described in a lab book or otner record /it so, pi 



Was the irwe^tion built or tested? If so, the date: 



irso, tneaate; . i , . _ 
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B, Problems solved by the invention. A4\£\ HP'&^W T&^0\JL 

C, Advantages of the tnvcntkm over what has been done before. ^^fiff^DT' 

0 . Description of the cohstru£^on and operation of the invention {Include appropriate schematiOiock, & timing diagrams ; drawings; 
samples; graphs: ffow^arty computer festtngs; . test results; etc.) 
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TO: MARC SCHUYLER 

FROM: CRA1GM. WITIENBRINK 

SUBJECT: TRUE TRANSPARENCY WITH THE FRAGMENT BUFFER GRAPH GS ARCHTTECTURE 

DATE: 11/09/99 



DESCRIPTION OV INVENTION 



a. prior solutions a ud the Lr disadvantages 

Prior solutions include; 1} software reording of graphics primitives at the application level, for 
example as neewsary in OperiGL [OjpenGL92] 2) nroiltipass rendfering with a dedicated augmented 
frarbe buffer fyl^x^xtsc&to 4 level* with a hardware crupfKelfe^ 

Software A- buffer reoderer [Qu^nterS4]; S) screen door transpareiiey suck ^ in 
the SGI Reality Engine [At»ley93J 6) a dedicated sorting network and RAM buffering [Bal5er94], 

The o^atiyan&ges of the software techniques (1, 2, 4), are the inefficiency of having an 
application sm primitives. 1%e sdmhg is viewpoint deperxlent, so for 31) graphics rendering, as a 
viewpoint is altered a new sortfeg of afl primitives would be done, or held in a prmewsly created data 
smacture such as an octree. Such j^sorting rriay require cutting primitives that are intersecting other 
primitives to break potential cycles* 

The disadvantages of th<e previous hardware tecliniques , are either quality trade-offs, such as only 
supporting a fed number of la)>ers (3), reducing the spatial resolution via a dithering technique 
called screen door transpamacy, viuph also supports dnly a fixed number of u^ansparent layers (5), or 
tremendous cost with rnariy^dkated ^ memories arid circuitry only for the sorting (6). 

Then? has been no prior soliitSon that provides economical true transparency without changing 
the application, in a graphics 3D rendering architecture. 

b. problems solved by the invention 

The problem solved by the inveriuon is how to compute a 3D graphics rendering of surf that 
are partially transparent, whik providing the ease of use of a tradmonal Z- buffering architecture. The 
problem is a long standing one m g^hics, arid involves the proper sorting of the primitives in the 
depth ordering along view rays. A primary advantage of the invention is how to solve this in 
hardware, with an economical (few gates) method. Also, there are no trade-offs in quality, as any 
number of layers at a given pixel may be supported,; and the memory cost is amortized across the 
entire screen, 



C. Description of the construction ami operation of the invention 

The invention is fully disclosed in the attached HP Confidential technical repQn ffiifa 
This invention would be added as described to a 3D rendering graphics ASIC or ASICs, to 
implement in hardware true transparency. A proposed hardware architecture, as "well as specific 
hardware control are given in the referenced technical report. 

d . Advantages of the invention over what has been done before 

The advantages of the invention oyer what has been done before are: the invention does not 
require modification of the ^pUcadon; the invention does riot force the application to sort primitives 
(triangles) tp f^perty; & invention is economical, as it requires only simple mrxiificarions to 
the Z-comparison and con^>ositir^g logic of ejdsrtir^g Z-bttfferi^ the invention provides 

new features without compromising existing features; arid the invention may also mcbrporate 
antialiasing. ; 

REFERENCES: 

[Akeley93] Kurt Ake%/f^tyEngine Graphics. Iii Proceedings of SIGGRAPH pages 109-1 16, 
AnaheirS, CA, August 19^3. A(^. 

[Garpentet84J| Loren Carpenter, The A-buffer> an antialiased hidden surface method. In 
Proceedijgs of SIGGRAPH, pages 103408. AQ4 July 1984. Vol. 18, No. 3. 

[Baker94] Stephen J. Baker, Dennis A Gowdrey, Graham J. Olive, and Karl J. Wood, Image 
generator for generating perspective views fromo data defining a model having opaque and 
translucent features. IJWtedStates Patent Number 5,363,475, Novv 8 1994. 

[Kelley94] Michael Kelley* Kirk Gould, Brent Pease, Stephani Winner, and Alex Yen, Hardware 
accelerated rendering of CSG arid transparency. In Proceedings of SIGGRAPH, pages 177-184, 
Orlando, FL, July 1994. ACM. 

[Marnmen89|^ Mamnien, Transparency and ariMasirjg algorithms iniplemented with 
the virtual pixel maps tachnique. IEEE Computer Graphics and Appfilcarions, 9(4):43~55, July 1989. 

[OpenGL92] QpenGL Architectural Review Board. OpenGL Reference Manual. Addison- 
Wesley, Reading, MA 1 992. 

[Wnner97] Stephanie Winner, Michael KeHey, Brent Pease, and Alex Yen. Hardware accelerated 
rendering of antuliasihg using a modified A* buffer algorithm. In Proceedings of SIGGRAPH, pages 
307-316, Los Angeles, CA, August 1997. ACM 

PX%tenbrink98] Craig M. Witenbrink, True Transparency with the Fragment Buffer Graphics 
Architecture. HP Labs Confidential Technical Report, HPL-99-T6D, Nov, 1999. 
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True Transparency with the Fragment Buffer Graphics Architecture 

Craig M; Wittenbrink 
Hewlett-Packard Labs 

1501 Page Mill Road 

Palo Alto, CA 94304 

November 10. 1999 



The fragment buffer is a new method for providing computation for true transparency of rendered frag- 
ments. Tme Uaiisp^ency is provided without altering the application, without requiring the application to 
sort fcfffe ciata, and without the deficiencies of pre\^pus methods such as screen door transparency and the 
ArMiaius; buffer. The fragment queue or fragment buffer can compute true transparency with any number 
of layers. A variant of tb$ fra^^ttt Inrffer that was designed for minimal hardware complexity, with max- 
imum algorithmic improvement is sitpulated. Statistics are shown for a variety of different scenes using a 
trace based methodology, and tin instrumented Mesa(T^) OpenGL implementation, The fragment buffer 
Ls shown to require from 2.1 to 3^ times more memory than traditional Z-buffering to provide true trans- 
parency. Detaft^ provided, {nciuding the state transition diagrams, next state tabic, 
arc^teeturM schematics; The fragment buffer can also be used for antialiasing, and an example of 
Carpenter's classical A-buffer ahii|J^ng is shown. A key; invention of antialiasing is to modify Carpenter's 
recursive algorithm into an iterative front- to-back processing. 



1 Introduction 

The EVa^nent Buffer Ls about ^diieving; greater graphics visual realism through novel use of resources. This 
paper discusses the ^chitecture and shows examples of a rendering simulator that uses the fragment buffer. 
Sutherland, Sproull and Schunmcker [II] e^plaineti bidden surface algorithms as sorting to detennine what 
is visible on the screen. Object spfcee primitives are sorted to screen space locations in X\ Y^ mA Z* Most 
architectures compare the # location with the ejeisting vaiies in a Z buffer, and decide what can be thrown 
out, or overwritten into the Z buffer; The technique is simple, arid fast in hardware. It has dominated 
graphics architectures for nearly two decades. But, there} are deficiencies. It is difficult to use Z buffering 
wuth cornplex blading and/or texturing idgorithms. The r^ison is because a large number 
overwritten, so expeiisive sha4ing, can be overly burdensqine. Another primary difficulty is that Z buffering 
is a read modify write, and so an actual sort is apt being done. Therefore, true trarisparency is not possible 
efficiently on a Z buffefmg ^chitecture. An additional difficulty of Z buffering is that axri^att^ng is expensive 
and requires expanding the Z buffer to the number of sub^amplcs used for antialiasing. 

Compute intensive shading can be done with a sort last; approach [7j. If the sorting of an object primitive 
to the screen, is left until the last step of the graphics pipeline, it is called a sort last technique from Molnar 
et al.'s proposed parallel rendering taxonomy. PLxelFlow [$) used sort last so that all pixels were determined 
to be visible or not, and then sh^ed; which makes shading and/or texturing work proportional to the frame 
buffer size, not the object complexity. As models become hu-ge, this is an important advantage. PixelFlow's 
primary difficulty is the bandwidth and subdivision necessary to composite entire screens between different 
graphics pipelines. A sort last approach solves the inefficiencies of the Z buffering architecture, butdo^a-not 
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provide correct transparency. 

Improved methods for correct tr£uispareac>' have bem investigated by Maminen [6], Carpenter [3], Kelley 
et ah [5, 12], and Baker et al. [2]. The proposed techinqiies are either software only [3], require multiple 
passes of rendering the geometry [5, 12, 6], and/or only render a fixed number of transparent levels correctly 
[5, 12]. Transparency is diaUenging problem to solve in hardware, and other tedimques such as screen 
door, or sorting the polygons baek-tf>front in the apphcation liave been used. There are quality problems 
with screen door transparency [1], essentially a dithering technique, and requiring applications to send the 
polygons in sorted order is not general to legacy applications or easy to do. 

Improved methods for antialiasing have bem investigated througliout the graphics literature, and numer- 
ous techniques exist in hardware, antialiasing has been done most directly through supersampling [1, 8]. 
Adaptations erf the A-b\iffer [3] to hardware have also been investigated, with partial implementations due 
to the generality of the A-buffer approach [12]. The difficulty to econoiniouiy implement antialiasing, has 
meant that most grapliics ardottv&ures support antialiasing with multiple passes through the geometry. 

Figure 1 on the bottom row shows what happens in OpenGL, when rendering 3 transparent squares of 
red, green, and blue. A different image res^ from each different drawing order, even thougli the 3 squares 
have a fixed Z depth location. On the t^p rowof Figure 1, different drawing order doesnot impact the visual 
appearance. This shows the results of true transparency, with the invention of the fragment buffer. 



Figure 1: Top row, fragment buffer, same appearanoB. From near to fair, the squares are ordered Blue, Green, 
Red. Bottom row. OpenGL? different every time. 

By the addition of a memory, called the fragment buffkr, proper transparent ordering and antialiasing 
can be accmomicalry implemented^ la hardware. I shew jhow, for example, correct transparency can be 
implemented in such an architecture. I also show how the A-buffer ^oritbm and adaptive antialiasing 
or nonuniform sampling can be implemented- Essentially, a frame buffer is used for storing the closest 
opaque fragment, or the furthest ti^nj^ai^ fragment if there aren't any opaque fragments. For pixels with 
additional fragments, those fragments are sent to the buffer along with their X and Y location. In successive 
passes, the fragments are considered, and composited [9] (Porter and Buff) into the frame buffer. Only 1 
pass is neleded for prooessmg the geometry, so no extra stoi^age is heeded for geometry, at^ a single fragment 
buffer is shared for the entire screen, liis amortization of the extra storage over the entire screen allows 
unique savings over techniques with large per pixel dedicated storage. For archite^urcs that do screen based 
subdivision, sutii a fragment buffer fits in naturally. Bucketization ef primitives and/or fragments reduces 
the storage requirements of the X and Y location, as it & addressed only within a tile. 

True transparency and adaptive antialiasing have only been attempted in hardware in high end graphics 
image generators for night simulation [2]. The use of a fragment buffer is flexible and efficient for the 
calculation of proper transparency, without multiple passes of the geometry. Multiple passes of fragments 
are efficiently done, and many fragments are culled, and eliminated with Z and occlusion tebtinj^Scrgn. 
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space subdivision ; which allows for the reduced communication .requirements, also provides for reducing the 
amount of sorting necessary. And, antialiasing can be done efficiently without dedicating a large amount of 
memory per pixel. The buffer area may be partitioned to provide efficient separation of different types of 
fragments, and the novel ease of nonuniform sampling in a hardware pipeUne. Experiments show the required 
memory to support true transparency with highly detailed models to be from 2.1 to 3 6 times morememory 
than a traditional Z buffer. Additionally, a reformulation of Carpenter's software rectu-sive antialiasing 
technique, shows how to perform iterative froht-to^back antialiasihg; with this proposed hardware. Many 
issues nc^ to be investigated, suicn as; the ^ implerneiitatibn of stencils, and the support of all OpenGL modes. 
But, the advantages, and the proposed performance levels, make the ardutecture an advance beyond what 
has been; possible. 

The main inventions describe how to achieve correct ordering for transparency and how to achieve an- 
UaUasmg economically. Section 2 describes in .the fragment buffer architecture in the context of a ^buffering 
graphic rendering architecture. Section 3 shows the state transition diagrams, and next state tables, for 
the comparison logic. Section 4 shows tlie results with several test data sets. Section 5 discusses fragment 
buffer variants, and Section 6 concludes the paper. 



2 ^agment Buffer Graphics ^Hardware 

To explore the most economical hardware, an invention that augments a conventional Z-buffer is explored. 
Many variations of the invention are possible, and are disci^a^ further in Section 5. The Scenario is as 
follows, the: areluiecture is a standard graphics pipeline, with i^rnet^ processing (G), and rasterization 
(ft). We; add a fca^taent buffer; and a 2nd Z storage to the frime buffer. This 6on6jgurati6n prdyicles the 
maximum advantage with the minimal additional memory. Figure 2 shows the architecture. Figure 3 shows 
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Figure 2: Graphics architecture schematic pius added fragment buffer, and second Z buffer. 

more details of the new processing. A fragment coming from rasterization or from the fragment buffer is 
multiplexed into the fragment buffer comparison and controller. The fragment buffer is considered to be a 
circular queue, and if the queue overflows because of excessive, fragn?eiits then it can be paged to systems 
memory. Because accesses are sequential and used on 4 first-in first-out (FIFO) basis, performance will 
degrade gracefully. A fragment is defined as a point sample with color, opacity, and depth resulting from the 
rasterization (R), as in RGBAZ. Depending on the fragment's opacity, depth, and the previous state of the 
frame buffer at that location, a fragment may be stored On the fragment buffer, composited into the frame 
buffer, or discarded. The fragment buffer comparison and controller is where the ^buffering comparison 
typically takes place. The z-buffering is now augmented* and revised to 
antialiasing- The processing is demonstrated through an example. 
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Figure 3: Detailed diagram of interface to fragment buffer and frame buffer. 



Figure 4 shows a pixel with 8 fragments, one of which is opaque, O. The transparent fragments are labelled 
Tl to T4 and Tx, Ty, and Tz. The fragments are drawn in the order shown 1, 2 ? 3, 8, and processing 
occurs as shown In the figure. Processing occurs by first coiisidering the firagmqits during rasterization. 
This is phase 1. Then, if there have been fragments placed into the fragment buffer, following passes are 
performed. Only a single pixel pf?r screen location is saved , so when multiple fr agents that are unoccluded 
Ue upon a single pixel, they are s6ht to the fragment buffer. This example has 6 passes: phase 1, phase 2, 
phase 31^ phase 32, phase 33, iand pha^ For a shorthand notation, the frame buffer is labelled B, the 
next Z value, or 2nd Z value is 5fia , next; ? prime is B^ J t and the fragment under consideration is F. 

Figure 4 shows how during phaise l v ;any opaque ikyers are; found. In this case transparent fragments, 
closer than the"- opaque layer were alt quelled, Tl ? T2, T3, and T4. Aii underline indicates the fragments in 
the fragment buffer. The tra|^ beyond the opaque layer, Tx and Ty, were queued. In phase 

2, these fragments, Tx and Ty, are culled, and the furthest transparent layer's Z is saved. The fragments 
are ^ processed in the same order each time as they are; read from and written to the fragment buffer. Note 
that the true depth complexity of the pixel is 5, but we toolc 6 passes. In cases where no further than 
opaque-transparent-fr^ queued, there is one less pass, 5. Phase 2 culls fragments Ty, and Tx 

as they are further from the eye point than the opaque layer O. Next, in phase 2, fragment Tl, has its Z 
value saved as B n:y and is put on the fragment buffer, as shown by S. fragments T2, T3, and T4 are also 
re-queued. Jn Phase $1, the frame: buffer, B, holds the opaque Z,- Oz, and color attributes, B^, or NextZ, is 
also in; the frame buffer, and holds the proper Z value for the nirthest transparent layer; Tl. In Phase 31, the 
fragments come out of the fra^ent buffer in the same order that they were placed there. First, fragment 
Tl is read, arid it's Z value matches BnZ. Therefore, it is immediately composited with the frame buffer. 
The next fragment is read, T2, and it is the furthest Z that is closer than the B h ., so the B' ns = T s2 (Buext 
Zprirrie) is written, and the fr^grneht is re-queued (S). FVagments T3 and T4 are considered and requeued 
(S). 

Phase 32 continues the same as phase 31, with the remaining fragments on the fragment buffer. There 
are T2, T3, and T4. Note that the B*^ or Nexteprimelof phase31 is B nz , or Bnexti of phase 32. This 
alternation of the interpretation of B n r and B* nx continues for each even and odd phase 3 needed. The 2 
storage used is the frame buffer Z, and tlie 2nd frame buffer Z. Always just 2 Z values per pixel; 

Phase 33 starts with a buffer, B, that is the previously composited opaque, and furthest two transparent 
fragments. B n . is the % value value of fragment T3. T3 is the first fragment considered, so it is matched 
with B n s and also composited* fragment, T4's Z is set to B^ 2 . Then in phase 34, fragment T4 is considered 
again, matches B njl and is composited into the final correct pixel color. Once the fragment buffer is emptied 
processing completes for that frame. The fragment buffer must contain the location of the fragments as 
fragments for the whole screen are intermixed on the fragment buffer. Figure 5 shows possible fraxsmgnts 

4 y^l^m 
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^ opaque 
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Eye T4 13 T2 TI O Tx Ty Tz time 

7 6 5 4 S 3 2 1 phase 1 

B 1 - fragment Tz 

Bnz/S\ 2 

S 3. 

S J 4 

S / 5 

S / 6 

s . 'if' 7 

Discard B (Tz) B^aquemvaiid 8 

.T4 T3 T2 Tl, 6,tx Ty, 

6 5 4 3 B 2 1 phas e 2 

fragments further than cull Ty 1 

opaque culled cull Tx 2 

Bnz/S 3 

S 4 

S 5 

S 6 
,T4 T3 T% Tl , O Tx Ty Tz 

4 3 2:1,MB phase 31 

:Fz«^Snz, composite 1 

BnzVS 2 

S 3 

5 4 
( T4 T3T2^^Lv3 

3 2 l^iz B phase 32 

Fr=B Decomposite I 

BnzVS 2 

S 3 

2 1,01X2 : B phase 33 

K^.Bi^cOTip6site 1 

BnzVS I 2 

l^fnz fi phase 34 



Figure 4: FVagment processing example, for Z-biiffer, with Extra Z, and fragment buffer. 



physical location 


phase 1 


phase 2 


phase 3 odd 


phase 3 even 


phase 3 odd 




B«, 














: 3« 


Bn, 


B^ 


Bn, 



Table 1: Use of 2nd or extra z buffer, the phase next z prime (Nbz') and Bhextz (B nz ) alternate interpretation, 
while their location is in B n2 r or Bz of the frame buffer physical location. See on the left. A 

to* 

AW 4^ 
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on the fragment buffer, with location (A' 01 Y 0 ) intermixed with (X py Y p ). The state machines that control 
processing for the fragment buffer are described in the next section. 

X P . Yp, 22, 42 

Xp, Yp, Z4. A4 

Xo, Yo, X , X <* fragment froa different location 

Xp, Yp, Z3, 13 

Xp, Yp, 26, A5 



Figure 5: Fragments on the fragment buffer. 



3 State Machine Specification for Single Erame Buffer Solution 

Figures 6, 7, and 8 show the state: transition diagrams of frame buffer pixels during processing. Figure 9 
shows the fragment buffer comparigon anrJ controller state be^iming -with initialization into phase 1 after a 
new frame. The number of phases varies dependiug on the frame's depth complexity, as illustrated in the 
ejtamplc in the previous section . Phase 1 is always first, and aB f ra^ierni are considered during rasterization. 
After all fragments from rasterization have been processed^ any £ra^hents that were placed on the fragment 
buffer ard considered in phase 2, and so on. The processing ^ terniiriates when the fragment buffer is emptied. 
For phase 3 5 processmg alternates bctu'een odd arid even phases as sJidwn in Figure 9. 

Each ^arne buffer location state; So each pixd is a itate macbine : only, you just need to 
consider the state iiiacJiine for^he fragment location. The state marine req for pbasel , 3 states 

for phase 2, and 3 states for, phase 3. T oe 6 states have been labelled as shown in Table 2. 



BOTHJNVALID 


initial state, ho fra^ients seen here 


^Lrp:OPAQUE 


1 op^ue fragment se^n an^ stored 


VALID.TRANS 


1 ^ fearisparent fragment seeiif and stored 


OPAQUEJNV 


an pp^aq^e fragment stored jfic^r ihari queued fragments 


BOTrLVALJI^T r O 


at least two fragments, opaque and transparent 


BOTH-\^LID-T-T 


at least two fragments, both transparent 



Table 2: State assignment definitions; for the 6 states in phase 1. 

The same state assignments are reused for all 3 phases. Of course interpretation varies between phases. 
The 2nd m& 3rd phases witt is located where zero or 1 were seen in phase 

1. Phase 1 states have been separated into 3 columns. In the left cplurrm, no fragment has been seen, and 
therefore ^ and B^ are both mvalid* & the middle column, B is filled. In the right column B and B nz are 
filled and at least, i fragment is on the fragment buffer. 

For this implementation, the rides for equal valued depths are that the earlier fragment is in back of 
the Mowing fragments. For a transparent fragment to be seen, it must be less than the opaque Z. After 
the fragment buffer is emptied, the rendering and compositing are complete. If there aire not more than 
1 visible/opaque or transparent fragment per pixel, then processing completes in phase 1. With only 2 
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Figure 6: Phase 1 frame buffer pixel state transition diagram. 
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Figure 7: Phase 2 frame buffer pixel state transition diagram, 
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current state 


inputs 


outputs/side-effects 


next state 


BOTfiJTOLID 


Fo 


B = F 


VALID.OPAQUE 


BOTHJNVALID 


Ft 


B = F 


VALID-TRANS 


VALID-OPAQUE 


F : >= B s 


none (cull fragment) 


VALID-OPAQUE 


VALID-OPAQUE 


Fa, Fx < B t 


B = F 


VALID-OPAQUE 


VALID.OPAQUE 


Ft, F z < B, 


B ni - F z , queue(F) 


BOTH.VALID-T.O 


VALID-TRANS 


F Qt F,<=B t 


B = F, (cull B replace frame) 


VALID.OPAQUE 


VALIDlTRANS 


F<,, F. > B z 


queue(B)J3 n . = B S ,B -F 


BOTH.VALID.T-0 


VALID-TRANS 


F t , F t <=B t 


B nj - Ft, queue(F) 


BOTH-VALID.T.T 


VALID-TRANS 


Ft, Ft > Bz 


quaie(JB), B nz = B 3 ,B= F 


BOTH.VALID-T-T 


B0TH.VALID-T-O 


{FA F s >= B t 


none (cull fragment) 


BOTH.VALED-T-O 




F 0 , F x < BtkF , > B m 


B -■ F 


BOTH.VALID-T.O 


BOTH-VALID-TiO 


Fo,F x <B s tiF x <=B„ : 


B - F,B nt = 0(invalidate Bnz) 


OPAQUEJNV 


BOmVALID-T-0 


F t ,F t <BjScF t > B ni 


B nc ~ F,,queae(F) 


BOTH-VALID-T-0 


BOtTHiVALID-T.O 


F,,F t <Bi&F t <*B n: 


queuc(F) 


BOTUVALID-T.O 


BOTHlyAtlD.tJ' 


F 0 , Ft > B c 


Bm = B r) queu0(B),J? = F 


BOTH.VAUDiT.O 


BOTIT-VALID-T T 


F 01 F t <= B t kF t > B n: 


B — F,(replace frame) 


BOTH-VALiD-T.O 


BOTHIVALID.T.T 


F ot F, <= S-&F, <= Bn. 


B =? F^ = 0(replace frame) 


OPAQUEJNV 


BOTifcVALID-T-T 


Ft, F S >B X 


Bni = #.,queue(J3), J9 = F 


BOTH-VAUD-T-T 


BOTHiVAIilD-T-T 


F t . F s <= B*&F, > Bn; 


B„ 2 .- F,,q«eue(F) 


BOTH-VALID.T.T 


BOTHiVALID.T-T 


Ft, F. <= B : kF, <= B ni 


qucue(F) 


BOTH-VALID-T.T 


OPAQf E JiSV 


F. >= B x 


none (cull fragment) 


OPAQUEJNV 


OPAQUEJNV 


F„,F S < B. 


B = F 


OPAQUEJNV 


OPAQUEJNV 


F„F : < B x 


queue(F) 


OPAQUEJNV 



Table 3: State transition table for phasel. queue(X) means to place fragment X on fragment buffer or 
queue. 5-fraine buffer, F-fragraent being considered, F p - fragment opaque, Ft- fragment transparent, (F z ) 
opaque/transparent don't care, F z ~> fragment depth value, B 9 , frame buffer depth value B nz - frame buffer 
next z depth value. 
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procesi tU friymcntj *t*sked in ptevkm phue 



Figiue 9: Over all state machine for the fragment buffer comparison and controller of Figure 3 



current state 


inputs 


outputs/side-effects 


next state 


BOTH-VALID.T.O 


^ ==*n* 


B ~ compoaiie(B % F) 




B0TH.VALID.T_O 




B'« : = F^<iueue(ir) 


BOTH^^ALip.T.O 


BOTHiVAtlDiT.O 


FA ^B^B'^ <B nzy F : <= B' nt 


qneiie(F) 


BOTH r VAUp„T^O 


BOl^VA^ID^.O 


m^ B^B^ <B nSi F z >B' nz 




BOTHuVALID-T.O 


OPAQUEJNV 


. F t > = $ x 


none (cull fragment) 


OPAQUEJNV 


OPAQUEJNV 


F; <Mz&F* > B« : 




OPAQUEJNV 


OPAQUEJNV 


i> < ft&fi <^ Bnz 


queuefFj 


OPAQUEJNV 


(OPAQUEJNy provides same behavior as BOTH-VALID.T.O of phase 1) 



fable 4; State transition table for phase?. queue(A r ) means to place fragment X on fragment buffer or 
queue. Mame buffer, F-fragment bmg considered, F p - fragment opaque. F r fragment transparent, (F x ) 
opaque/transparent don't care, _£- fragment depth value, frame buffer depth value B nz - frame buffer 
next r depth value, composite F) Porter and Duff over operator or OpenGL composite. 



current state 


inputs 


CRJtpiit8/side-^ects next state 


BOTH.VALID.T.O 


Same? as phase 2 above 


OPAQUEJNV 


Fz == B ht 


B ~ composik(B y F) : B z = B nz 
(or B^ = \ B n ^ iz = F t 
already) 


BOTH.VALID^O 


OPAQUEJNV 


F-J = B,„ : 


B ; = F,.qucue(F) (or B' n2 = 


BOTH-VALID-T^O 



Table 5: State transition table for phase 3. OPAQUEJNV only occurs in phase 31, not in phase 32, phase 
33, etc. Note: that for phase 3, the B» z and B» z are in different pJrysical locations depending on even or 
oddness of the phase. Phase2 looks like an even phase 3. 
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fragments in a pixel that are not occluded processing may complete in phase 2, and so on. Tables 3. 4 7 and 5 
provide the state transition diagram, next state transitions as well as the side effects arid outputs. In phase 
3 the B n . and: B' nz alternate their physical location as shown below in Table 6. The outputs and side effects 
are giveri in sequential order, for example in Table 3, 7th row, "VALID-TRANS, F dl F z > B 2 * the frame 
buffer fragment is written to the fragment buffer, crucue(J3), its Z value saved as next z, B nz - B £t and then 
overwritten by the fragment under consideration. B = F. 



physical location : 


phase 1 


phase 2 


phase 3 odd 


phase 3 even 


phase 3 odd 


Bnz 


Bnz 






B ns 


B f 




Br 


B' nx 


B n: 


BL 


Bnz 



Table 6: Phase 3, physical location changing meaning from Bnz to Bnz' in odd and even. 

This state machine has been fully implemented and debugged, and several models have been rim through 
it as discussed in the next section. 



4 Rfesults 

The fragment buffer implementation with a Z buffer and an extra Z buffer as described in the previous 
section has been implemented. Statistics have been gathered on several models to provide an indication of 
the performance implications. Table 7 provide statist ics for processing of 4 scenes. All images were 512x512 
pixels. The scenes as rendered are 11 to li. Artifat^ from OpenGL (inidclle images) include 
the tire in the back seat of the chevy in Figure 1% and: the nose gear on top of the helicopter in Figure 14. 



data 


no, of phase 3 


total passes; 


Z bandwidth 


frag bandwidth 


avg depth 


scene 


6 


8 


2,765 : 189 


7,497/201 


235 


spheres 


10 


12 


8,178,293 


^,889;274 


3.93 


chevy 


8 


10 


3,404,9m 


8,396*718 


2.17 


heli 


9 


11 


2,798/204 


8,155,998 


2.66 



Table 7: Fragment buffer processing statistics, bandwidth in bytes, all 512x512 frames. 

The nec^sa^y bandwidth to the memory holding the; frame buffer, extra z buffer, and fragment buffer 
was computed during the execution of the simulation. This considers the memory model shown in Figure 2, 
where the frame buffer, extra Z buffer, a«d fragment buffer are all in the same memory. Table 7 provides the 
memory traffic for the conventional Z buffer (Z bandwidth), and the fragment buffer (frag bandwidth}. The 
depth complexity is also provided in the table, and is an average over pixels covered, for the case where all 
geometry is considered transparent. Figure 10 plots the conventional Z buffering bandwidth to the fragment 
buffer b^dwMh. For these scenes* it can be seeii that the number of passes may be high, on the order of 8 
to 12 parses. For complex interpenetrating transparent objects the depth complexity ean be arbitrarily high 
at a given pixel. For example, in unstructured volume rendering, the depth complexity will be much higher, 
as there are thousands of layers for some pixel locations. But, because the application sorting of the data 
prior to rendering is such a burden, even the numerous passes of the fragment buffer will achieve superior 
results. 

The key thing about the bandwidth numbers, is that they are on the same order as the Z-buffering, 
and there will also not be any texturing at the time that the fragments are being sorted. Therefore, the 
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perfbrniance impact may be modest, because the first pass will be similar to Z-buffering, and the^jbsequent 
passes will not be competing with texture mapping operate. The ratio of Z buffering traffic to total 
•fragment buffer traffic varies from 2.5 to 43 in these examples. These examples are also severe, in that ail 
surfaces are transparent, so a new capability will place different stresses on the system- True transparency 
is -not- as easy as Z buffering, so some diffeerice in processing is expected. The number of passes for these 
scenes in not expected to vary with h^her re^Jutiona, but the cumber of fragr^his will increase in both Z 
buffering and fragment buffer processing. 




Figure 10: Bandwidth for the four scenes shown in Table 7.. Conventional Zbuffering traffic in bytes is 
spared to rendering all surfeces as pai^aUy transparent with the fragment buffer. Those with the lii^iaft 
scene coroplaxit^ and depth complexity require more bandwidth. 




Figure 11: A scene of a cone torus, and sphere. The left image is Z buffering, the middle image is OpenGL, 
and tlie right image is the fragment buffer. 

Figure 15 shows an example of the multiple passes that are taken. In the phase 1 pass, in the upper 
left, the rearmost fragments are determined, and placed into the frame buffer. On the next pass, in pbase2, 
Uie next finihe^ tra^ into the frame buffer. In these renderings, the front and 

back faces of triangles are shaded, as the front and rear of the sphere are visible with all surfaces slightly 
transparent. 

The implementation was also rigorously validated with permutations of cases of 3 transparent levels, as 
shown in the introduction, and with 3 transparent levels and an opaque layer that was placed in all possible 
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Figure 12: A scene of 7 intersecting spheres. The left image is Z buferiag, the middle image is OpenGL, 
and the right image is the fragment feutfex- ThiB SOcae is meant tb compare to Maminen'a Figure 4, where 



he a 




Figure 13? A sc6ae of a 
image is 



;> the middle 



Figure H: A scene of a Caligari Thie Space model of an Apache Helicopter. The left image is Z buffering, 
the middle image is OpenGL, and the right image is the fragment buffer. 
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locations, in front of all three, at the same depth as ail three, behind all three, or in between thei 1st and 
2nd or 2hd and 3rd. The transparent layers could be drawn in 6 orders. The opaque layer was placed at 7 
differen^lod^on^ and the opaque ja^<x>ulti be^lig^ in 4 Afferent tedemto^i^ the transparent 

layers, E£uxe ijS ahows 3 examples from; the 168 (6*#*4) combiMripBs that were vCTified. 

FVomJ the. shnulation, conclusions can be made regarding the size of the in^ory needed to support this 
fimction|fity; Ti*e Same buffer is assumed to contiun a; fragment, which I define simply as R, G, B, A, 
takmg 5 kbytes or spy Each <xonponent 1 byte, and alpha typically needs higher precision for 

composting, which would be necessory for an architecture supporting: tarue trisuy^arency. In tike examples 
j&dwh^^ so an alpha buffer i3ri-t actually n^ded in this instance but 

has beeng assumed. Sp* resolution * 5 is the frame buffer storage in bytes. AddMtiaally, ; a; Z buffer is used, 
*md thte|is;;a^ 4 bytes. For fragment buffer piSH^jas^ an additional Z bu^fe is u^;fori 

ariditfori^ It is assumed that the 3 bits nested for state p&i pi&], can be within this 4 

bytes, c^simplj^addetl 4 bytes 4- 3 bits. 

Now Jihe adkual size of the fragment buffer to be used is at least as large as the number of fragments 
^ be stprcd iii: t^ buiter on phase 1, This is soruewhat arbitrary, aa it depends on the depth 

complexity (BGj of the scene (fully transparent depth templexity), and th^ perpmt of the frame buffer 
covered.l The i^ded meimwy is t3^\D$> pen^_^trer^ * added Marage. T8$ added storage is straight 
forward to ^(^|a^ Z>^tei? ^4)f \ atnd : tn£ address of 

thevpix^tte $ 3. .In this : 3 b>^ toiu^ be '^ff.to address 

&^0&&i^&^ "Table 7 det^fed the band^dth on <^ra^ bu^ 

memory ^eede^ ^or these datasets. The direct measurements ribow that from tor &63 times memory is 
needed for a sydtem with the fragmait buffer; The r^iutions can be sealed up, tod the assumption that 
the saini ; jiea^^^a^ ^ pixels Jsufe cb^«edv aai^ that the depth compl^dty staays the s^e, m^rna the ratios 
are approximately the same. The onfy ^ ris^tfoiijis the si^of tifife address necessary to 

store cm|tfe fri^nent buffer it^ff; Efar il2^12Lframe bjuff^ 18 bits may be used, slightly ;inoi^ bits 
h^r r^uti^ 1280x1024 wiih 21 Wte^^ 160^1^ with 22 bits: 

This increases the ratios of additional memory needed only slightly (2*13 to 2:i7 for the helicopter). 



data : 


•frame buffer 


extra Z 


fra^ent buffer 


total fragment buffer [ 


total frag/frame buffer 


scene 


2;25MB 


1 MB 


1.5TM 


4.82 MB 


2.14 


spheres 


2.25S1B 


i MB 


4.92 MB 


8.17 MB 


3,63 


chevy 


2;25MB 


1MB 


135 MB 


5.10 MB 


2.27 


heli f 


2,25 MB 


1MB 


1.55 MB 


4.80 MB 


2.13 



Table 8:,lfeagment feuffer processing stal&tics, nuninium memory ii^ge m MBytes, for the 512x512 frame 
buffer (cmst^nt), extra Z (consent), fragment buffer, total ofc fragment buffer, frame buffer, and extra Z, 
and a ratio of the total fragment buffer n^emory versus the frame buffer for traditional z bufffermg. 



But, for tile based hardware, there are additional improvements possible b^e&use of the r^uced address 
to be stiied with fr the fra^ent Intfer. fbr cbcample a 16Q^i2(^ screen, requires 11 bits for X 

and V, for a 2$ bit overhead perfragihenL Tilirig by 64x64 tfe, requires 12 bits for X and Y, versus 22 bits, 
a lObh ^ving for fragments. The savings is a fbefl jw^nt^y i«> m^t^ how large yon dedde to rn^e ^e 
fragment buffer, ranging from 10% to 15% for the 5>12x5li to 2048x2048 r^oluUons otmside^ with a 64x64 
tile. For most graphics the depth complexity is considerably less, so less memory would be consumed. 
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5 Fragment Buffer Variants 

The fragment buffer impJ^icnta,tiou as; described Jb targeted for lowest cost iwith [reasonable perfomariee. 
A continuum of choices trading off memory versus number of pi^es; is possible. In addition the memory 
overheadiis reduced> for tile baised architectures. Antialiasing can also be supported, which provides for 
an adaptive supcrsampling strategy. Table 9 shows: the number of passes needed for various hardware 
complexities/;- <3ase 1 is 1 frame buff^, to which 2n passes are needed, where n is the worst case transparent 
layer depth complexity. Gose 2 isvtrfe^ frame 
of Z\ The number of passes is halved tp n. This ^se 2 alsb ; ^ tile or region 

compositing is ^needed,, became compositing occtios when the fragment z value matches the next z value 
P z B nz . Case 4 is the most general, where there are JV, frame buffers available for the entire ^ot (or 
tile). In this case only njN passes are necessary, but with the obvious increase in necessary memory- 



case 


diescription 


total passes 


1 


1 fria^e buffer 


0(2n) 


2 


1 frame buffer and i Extra Z biiffer 


-O(n) 


3 


2 fx^e buffers 


0(n) 


4 


8 frame buffers ■ 


0(n/iV) 



Table 9: Trading off memory venms iiumber of passes. Case 2 is the one presented in the previous section. 
In this sectic^ w^ briefly 

architecture we ^ with tljje ^ I show an example 

of proces^g to impleme^ TheEMerredZ 
architecture is a pipdine that delays the. z compares and sorting, and also allows deferred shading to be 
performed. Figure 17 shjo^ From the left, primitives are sent across 

AGP (Accelerated Gri^fii^ stage. The output of this stage 

ar£ the S^^ The next s≥ is the rlierardiical 

Z<ulling or culling, where hierarchicaJ Z-culHrig is performed only for a region of the screen . Following this 
is XY mteri5»tion of primitives that are not occluded. In this case rasterization means conversion of the 
triangle data to raster X Y coordinate beat ion fragments^ although all shading attributes such as normals 
are forwarded with the rr^ment; 

The next stage is the fragrnerit buffer compare logic a$ described earlier, with compositing moved later 
to follow* shading. The fragment Stack, is used for teniporary storage to consider surviving fragments. 
Fragments after the first pass are sent along to the lighting and Coloring arid Shading phase of the pipeline. 
Because the primitives are sent in the proper order, ttifey can be directly composited once all lighting 
calculations have been done. After ithjs point primitives ;are sent directly into the frame buffer, which is 
distributed in the screen bucket fashion. It is assumed that buckets would be screen square tiles, but this 
architecture could also use scanjirie par^elisrii j or piiel mterieaved parallelism. 

Ilierfe are four memories in the systenii jabelted A, B,jG, and D. There are two new memories in B: the 
Fragment Buffer and additional Z-buffers and attribute storage; There may be none ox a small number of 
adtfitional Z-buffers. Figure 21 and 22 show TV = 2, and other cases of jV == 0, and N = 1 are analyzed also. 
The pipelines are truly independent, and no intercommunication is required between them for single pass 
rendering; This assumes that the texture memory is replicated so each pipeline bas unfettered access to the 
textures that they need. 
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§u<*ee<liag Visible lifted and 

Fragm ?!fe^ , ?* c ^ — ^Composited 



Screen Spa 



Vertex Raster 
Primitives 



AGP 


PwxOSag»d 




. XVbi*fcfit 




Figure 17; DeferredZ overall architecture 



5.1 Ragirient buffer for correct transparency 

This variant of the fragment buffer provides deferred shading and true transparency. Figure 21 shows the 
memories indudirig the fra^^ buffer, depth ^Ufea*(|S>v and ^tribute buffer(s) Additional depth and 
at^bute m^ori^ may be used, bxit the r^denng ajgbn^rn will % described with two 2 and attribute 
buffers, and other cases will he described following that. The line surrounding the memories are those in 
Figure 17 labelled B, fragment burf^ and attnbute buffers; The fragment buffer may be implemented as 
stack, RAM, <jueue v or FIFO^ beeaiise the order! in which tl^ fra^ferks; are considered is sequential. All 
fragment!* iure read in during a phase, aqd may be requeiied! 

R^er^mg itp Figure 17/B^r^ performs the geometric processing 

necessap to calculate Uie:?areen space v^tex lotions. The H& block perform Z-buffering as 

explorcdby Ned Greene, [4]. /Vt this point are the^succeedin^ by their 

vertices. The triangles enter the ra^erkation:stage f where fragments are determined. A fragment in the 
DeferredZ architecture is the per sample data with; Z~a num^ricjeJly computed perspective depth, and the 
attributes, A, that axe used for l^htingj te^urings and shading, 

Definition Fragment = Z + Attribute^; (in material valiiev n^ texture coqrdin ates) . 

The pTOcessinjg in the blocks given m figure 21 consists of; the?f6D6wmg: 

1 . Rasterization: take succeeding primitives and compute succeeding fragments. 

2L Z comp^e/Z sort/A A compare/ AA sort: Take succeeding fragments and process: thaw to determine 
thefirst set of visible fragments. The number of visible fragrnent depends upon the number 

of Z arid attribute buffers, A. 

3- Ciglrt/sh^e/t^ Take visible fragments, and compute the lighting, shading, and tex- 

ttiringi and composite with themselves and fr^ebirffer> 

The calculation performed in the Z-eompare/AA compare, Z-sort/A A sort can be described as occurring 
in multiple phases. The first phase is where all triangjes (or primitives, where primitives may include 
polygons, lines, points, etc.) are rasferizeu\ and sent to the compare and sort block. The compare and sort 
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block processes them to compute the Z ordering for N layers of attributes. If N \s 0, ami only the frame 
buffer is used the algorithm is different as mentioned earlier. 

Consider two scenarios to simplify the description, first the scenario where there is a single sample per 
pixel, anti secondly the scenario where multiple samples per pixel aVe taken. For the first scenario, the 
correct ordering of tr^spare^t lay^s csto be; deteriiined. rWher consider the ease where there are two Z 
and attribute buffers, the processiiig ^ Figure 
18 showslthe pseudo-code for phase L Fij^e 19 shows the p^eudoH^^Mr pha$ei2i And Figure 20 shows 
the pseudo-code for phase 3 and higher- Figure 23 shows for a sin^e rirxel a h^ covering 
and depth ordering. The eye is shown on the left; and the partiidly transparent fragments, f Xy are drawn 
as a line r while the opaque fra^ents arts drawn as a line with cross hatdtog. Let us consider this as an 
example and describe the ^ pr^dassihg;; tHjat Occurs during each phase of 2 fragnieiit sorting. 

To start in phase 1, Figure 18, all primitives are transformed, occlvision tested, then rasterized. A3 
rasterized fragments are created, they are inserted into the two buffers to determine the closest opaque 
fragment and the furthest unoccluded transparent fragment. 

Rasterization: 
for all primitives 

xasterize to fragBente 

pass fragments to t compaxa and Z sort 

Z compare and Z sort phase 1 : 

lor all fragments (as received from rasterisation) { 
fetch 7b in f rem cenory 
if x>» Zain discard fragment 
else if fragment is opaque < 
write z->Zni», 

write fraprtent info into fcttribu tee for abad ighting etc, 

■■> 

elite /* transparent */ <. 

fetch Zfax transparent , and ^attributes from taeotbry 
if Z > Ziar transparent { 

write *-> Zfax transparent anil overwrite attributes 

write Zfarold and attributes to fragment boffer <X,Y.Z, A> written 

> /* z > Zfar transparent */ 
else 

write Z, attributes to fragrant tmff*r 

> /* else transp«x«nt */ 
> /* for all fragments */ 

Figure 18: DeferredZ variant, Phase 1 Rasterization and Fragment Sorting. 

After phase 1 all primitives have been considered we have a processed frame buffer Z ne(ir , A ncar , 
Zfvrtnxupmnt, A far transparent, and a fragment buffer for all remaining fxlagittents to be considered 
[X.Y^ZjyAf) as shown in Figure 22. Next is phase 2, to merge the remaining fragments on the frag- 
ment butler For phase 2, unload the current Z and A buffers to the lighting shading and compositing stage, 
or send Z neart Anear of closest opaque layer and Z/ or , Aj ar of furthest tranaparent fragment. Note that 
in- this form of the invention- a valid or changed bit would be needed to know which fragments to reapLout 
unlike the fragment buffer described earlier. A k a^^\\ h 
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Figure 23 shows a case where O u the closest opaque layer, and Tl t the furthest transparent layer are 
discarded. The xemato are all sent to the fragment buffer. In ibis example, all layers 

beyond <3i we^ dtfe^ c^duded or disci&d^^ buffer. Ttus is npt necessarily 

the case^ as the b order of primitives is arbitrary to start, sxrft^^^ but 
later Arriving primitives must be occluded in pass 2. In the fragment biiffer of the earlier sections this 
event wis not^ by changing ^ to OPAQUEJNV, whi<% means atn opaque layer invalidated previous 
information^ process the rej^ 

using the same tw6 Z buffers a& bdtore. Figure 24 shows the multiple ph^, ph^ 2^ phas^ 3 r and phase 4 
to sort the remaining transparent layers. So, given N Zand A br attrioute buffers, it Will take a number of 
passes where jD is tbVworst case depth compfedfy, B- I ti?ajisparent laya^, and 1 opaque layer. 

The al^t^ for the pha^ 2 is given in Figure 19. the algorithm in 
Eigifte % th^; j?h^ machine provid^ ^ d^ertmce here 

being 2 fiiU attribute and Z buffers, and a frame buffer following compositing. This aspect of the invention 
has not yet been implemented. , 

set Zt&x transparent to -infinity 

set 2»in to previous tain or closest transparent fragment value 
for all fragment* (retrieved jra© fragnent Coffer) { 
fetch Zmin irom -mea&ry 
if x>» Zaia discsurd fragment 
else /♦ transparent, and not occluded: ♦/ < 

fetch Zfar transparent, asd A-at tributes from oteaory 
■if - X >■ Z* ar transparent i 
if Zfar is valid { 

if Zfirst from: far is valid { 

Zfirstfroa far, a go to fragment buffer 

> 

ftove Zfar , Afar to 2f iratf rem f ar » A f irstlroaf ar 

> 

} ff 2 > Zf ar transparent */ 

write z; attributes to fragment buffer 
> /♦ else transparent */ 
} /* for all fragments '*/ 

> 



Figure 10: DeferredZ vajriant, Phase 2, discarding further layers v and determining next If - 1 transparent 
layers. 

This demonstrates the logic of replacement during the fragment buffer processing. The key advantage 
of the fcagftent buffer for transparency is it computes exactly the correct ordering with fixed Z buffer, 
attribute^buffer, and a variable fragment buffer. By only reading out fragments that have been updated* 
and invalidating those while writing them out you can get the valid bits for Zfa and Z^ rAtmfnJ ^ reset very 
economic^ly. The technique does not preclude image space subdivision ibr example, two areas being worked 
on, one with Ughtinjg and compositing occurring, and one with Z compare Z sort occurring to effectively 
load balance the work between units. Or, for example, passing along any succeeding fragments with texture 
values to be processed , and re-Z-buffered, depending on the mix of worje. Next to be described is antialiasing 
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tor all fr&gmaatu (retrieved froa fragment buffer) < 
if 2 > Zfar { 

if War is valid { 

^if : Zf irst :freta far is valid { 

Zfirstfroo fax, A go to friajjpi&eht b&tfer 

> 

; aov«' Zfar, afar to Zf irs t from far , A f irstf romfar 

■■■■ \ 

} ; 

overwrite Zfar with Z 

> 

also lf : x- > xlixst fromfax -( 
if «f irat froa far is valid < 

sfiretfrotBfar, Af ixstfroiaf ax to fragjaeat buffer 

> : 

overwrite Zf irstf romfar vitb Z 

} 

> /* for all fragments */ 



Figure 20: DefcrredZ variant. Phase 3 and higher, determining next iV tran3parent layers, 
with the same fragment buffer* 

5.2 Ptag^nt Stack for A-buffer/ Carpenter Antialiasing 

An instructional example to demonstrate the flexibility of the invention is to show how the A-buffer an 
antialiasing scheme by Lor en Carpenter [3], may be implemented. Some key things to note about the A- 
buffer > is lhat |t has b^ oft^^nul^tedj is a software only tedmique. and correct transparency as 

well as aiittialiaaing. Because it is> a software only technique, showing kov? it may be prafcti^ly iinpiemented 
in hardwire is a significant advancement beyond the state-otehe-art: Other hardware architectntires liave 
emulated|S0me part of Carpenter's A-bufe, such as Stephanie Winner et al [12] v but with txade-ofe such 
as handling only a fixed number c£ fragments per pixel, say 4. Others such as Molhar et al. [$, 10] have 
claimed to be able to perform A-buffer processing, hut such claims are not supported. PixelFlow cannot 
sort transparent foyers because thek compositing network, is only a Z^mpariscm/^ To 
do proper resorting and^ compositing is an unsolved problem for a PixelFlow, or sort last architecture. 

Arbim^r defines afragmeut as a polygon clifj>ed toapixel boundary. They have two cases, either a pixel 
is fully petered ^ ^ or>a^ polygon, or a pixel is p^tially covered by transparent and/or opaque po^ 
The dat^st£uc$u^ a linked list of fina^neats, sorted from f^nVtb-b£u& by their minimum Z, shown hi 
Figure 2& A pixel struct stores both cases of a pixel. Figure 25 shows the pix^l struct. 

The mask is a 4x8 bit mask, representing subsample locations that the polygon covers. Two Z values are 
saved for each frapnent^ a minimum and maximum # value, to aid in blenoUng fragments that overlap. The 
result of processing is an array of pixel structs, the size ofithe fiame-bufier, and linked lists of fragment* of 
variable length for each pixel. 

Now, to implement the same data structures with the fragment buffer requires sending the fragments 
for each pixel, on to the fragment buffer, if they are not the closest iV pixels, where we have JV, Z and 
attribute buffers. For earlier examples N is 1 or 2, and so I use, N = 2 for this example; atep. All polygons 
are rendered and converted to fragments, and the 2 closest to the eye fragments are stored Jn the random 
access attribute buffers, while all other fragments are sent to the fragment buffer. 
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Figure 21: Memories for true tr^ispamncy and imprpx'ed antialiasing in the, B, fragment buffer arid attribute 
buffers, portion. 
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figure 22: Buffers aad fra^^t biifer after first phase of ra^terbation and sorting. 
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Figure 23: Example of succeeding opaque and transparent layers. 
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Figure 24: Remaining phases (phase 2, 3, and 4) for determining sort order for transparent layers of Figure 
23 
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Figure 25: A afctprt int Ls 12 bits, and both an area and opacity are used to more accurately determine pixel 
coverage 
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Figure 26: A- buffer data structures from Carpenter's software techni 
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Fragment 




Figure 27: Fragment buffer for antialiasing, ;V = 2, Z and attribute buffers, the fragment buffer, and the 
hardware modules at this point in the pipeline. 



Figure 27 shows a schematic of the fragment buffer, and the two Z and attribute buffers. In the case 
for the traj^arency only, we sorted from back-tc^front, now, according to the A-buffer method, we sort 
front-tc-back, by fra^ents nnnimum Z r so, in essence, we have the first two fragments for each pixel in a 
d^ica^e^ rinddm access ixr&y, and aji others are thrown onto the fragment buffer: 

A-buffer prw^Mg r^ui Instead, a multipass consideration 

of fragments is j^rfprmed to compute result. Also, muitaiiie buffer are used when traversing. First, 

how the ftorit-t^ the Z\ and Z<± buffers store the near -Z% wbUe the attribute 

buffer stores the far Z, or delta Z ? Z min Z m i n and Z mox ~ Z^ + Z ma *<*eUo- This allows fewer bits to be 
used for Z rao3; . Ailfraj^e^ and. po&ible replacement of Z\ and Z2 as described 

earlier. Recall the earlier example, here shown sorted after phase 1, in Figure 28; 



1- \ / \ / \ f/\/\ 



s 




Figure 28: A-buflfer example with same fragments as sorted in Figure 23. 

Figure 28 shows that now we keep transparent fragna>rits T 5 and Z* in the Z\ and Z it and all other 
fragmentary sent to the fi-agnient buffer . The sorting is done by near^t Z to impleinent A-buffeh Fragments 
j 3t T 2 , Tu O x are sent to the frapnent buffer, as we\\ as fra^nentsi beyond 0i, which wbuld have been 
discarded in backrto-froni ordering, T^ O^ T Xi T xt We continue to sort as shown below in Figure 29. 

Fragments 7i and IV are captur^^as the next two closest fragments. All oiJier fragments are again seat 
to the fragment buffer. In the h^t pass through the fragments, the transparent lay^ Ti and the opaque layer 
Oi are found, and at) other fr^m^ts are discarded, The lighting imd shadmg/cpmpc^ module blends 
these fragments as they^ n^y coyer the pixel, so they are simply processed ^riVta^ here. To 

show how processing proceeds when fragments partially cbver a pixel| I pro^de example of coverage. The 
fragments are still all sorted front-tOrbaek, but now, they are not necessarily composited in that Order. A 
recursion is created by recirculating the fragments again . 

A fragment's coverage of a pixel detennines the inside, subsamples covered by the polygon, and outside, . 
subsampfes not cowered by a polygon. Figure 30 shows the coverage of four polygons A r B, G, and D over 



a square pixel area. Note that improved fragment representations, such as those that use I?*, D v slopes 
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Figure 29: A- buffer example ctaitinufed combining from Figure 28, continued combining the next two and 
the next two, 

can also be used. I am using Qar^^s ^gprithm for ciarity, and would recommend implementing a mort; 
sophisticated .algorithm. The; polygons would be prpciessed into fragments for this pixel, with A, B r p ; D, 
front-to-back after tr^ first ph^ Th e A andB polygon Z 2 

buffers: The m^ Jhere shd^:;atfr 4x4 would he part of the attrilHite huff^ When the Zx.Ax (attribute), 
and Z 2 , A2 buffers are unloaded, the search mask is computed. The search mask is the binary mask for the 
outside of the current accumulated fragments, M ie orch> For this example the search mask is first set to the 
area outside of the fragment Acnt 9 MtearcH ^ A ontt then this search mask is used to compute in and out, 
from front and back 6f pe4ygon fragmeht B- The equations directly from Carpenter, page 105. 

And, the interesting thing that we can do, is to cliange a recursive circulation to a purely iterative one by 
knpwingthat <:ontp<^ting m be modejpy associative if computed with transparencies. The accumulated 
transparency is c^culat^ in the shade/cprnposi te circuitry, and stored in the frame buffer. Essentially take 
the flowing ^ifotiiio frpn> Cair^ter 

C = G in x Ain. + Covt x (1 - A*) as recuxsed for our example C == CmA x Am + CW, x (1 ~ A inA ) 

Gouil — Anfl X AinB +C ov i2 X (1 — A{rfg) 
Cents ■= CinC X A inC '+ C^tS X (1 ~ A^c) 
CoviS — Ginp X Afro + 0 

Here £? is color, C in is the color from the inside covered area of the fragment r Am is the area coverage of 
the fragment Th mial color for the pixel is 0^ Here Carpenter's approach is: converted to purely a sum 
in terms rif trahsparenny; 

C ~ CmA XAm + tA X Cmfi X A^ B + tAS X CinC * A mC + tABC X C wD X A inD 

Four separate contributions from the four separate layers, where transparency of polygon fragment A 
is Ia = (1 - AinA)- The equations show how the recursion may be converted to direct evaluation. Figure 
30 shows how the masks for coverage of each fragment are initially calculated. The A in in the compositing 
equations are those masks, consuzlxied by the search mask. The search mask is updated as the sorting 
processes fragments from frdnt-tc^back. As Wore fragments A, B, Cy and D ar^orted front-tc-hack, and 
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Figure 30; Four polygon's coverage masks far a single pixel. 

shown in Figure 3 1 . The search mask is updated front«tchback, aiid used at each step to compute the 4*n> 
or as an approximation to th^ area* The second column == : Af^orcA H Mj is used for A^, at those 
steps, Aiid. each A^ is used to update the accumulating transpareucy. Am for A = : M in for A> 10/16. 
Then 1— 10/16: A^ for £ ct|uals ifef^ = 3/16, t H =£ 1 - Zflfy aridr^ = (1 - 10/16) x (1- 3/16) 
iABC = (4/16) x {13/16} ^ 

Figure 31 sb<>\vs clearly hqw the elfjuatie»n usiiig transparencies is built up in time* 1 show below the 
pipeline r^ocesain^ and calculating fcr the final result. The exu^ent frapiieiit and the search mask are used 
to compute mask in and the next search mask. 

The search mask is computed by Af 5 ;== Movt - M$ O ^Mj (current y the mask for Inside the fragment 
is ccmipn^b^M The iiiniisk 

the currexit fragment itia&k ^ ic^ed from ^ of the attribute biufes, Ai or A<>~ The masks are used to 
compute the opacity or roughly the ai^ by sumxning the bits with the min ma&k. 

Wheji the fragment pixel locaticoa- h^ bee^ processed, the search mask for that location mttst be sorted to 
memory into the attribute buffer, and reloaded. The same is true For the transparency, and the accumulated 
color. To implement A-buto, the search mask would have to be added to random access memory storage, 
as in Z it Ay, 2>^ Az x and M$ for the sort and compare memories* This would allow for fragments from 
different pixel locations to be considered in any order, as the mask state will always be available. 



6 Coneliisions 

The economical miplementation of true tr^isparency has been shown tliroiigh the invention of the fragment 
buffer. A simulation was developed for the best fit for a traditional % buffering architecture. The detailed 
hardware architecture, and control logic were presented. Simulation results were shown, as partial validation 
of the design with several models. The fragment buffer can provide true transparency with additional off 
chip DRAM storage, used in combination with new compari8on ? control, and compositing logic. Z buffering 
architectures already include roost of the <ximparison logic and compositing. The novel aspects are a state 
machine per pixel, and multiple phases of processing recirculating fragments. 
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Figure 31: M/, and Af MAK A arc computed as the fragments are sorted from front to back, shown by 
the timeline going from top to bottom. 
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The architecture has the advantage over previous work of 1) not requiring high per pixel dedicated storage, 
2) supporting any depth complexity as long as the average depth complexity is bounded, 3) not requiring 
on chip storage, 4) hot requiring multiple geometry passes, o) not requiring software sorting of primitives, 
and 6) nqt being an approximation, nke screen door transparency. The benefits die sui^tantial, but much 
ftirt her wprk is needed; Examples were prpvide<l for how such a sciieme a>uld provide antialiasing, and also 
work in altile based*deferred shading ^ ^c^itectiure; "Thexe is additional off chip meniory required, from 2.3 to 
3.6 times^more memory for scenes with 12 layers per pixel ^4 ^eraj^?yepth jd^^e^dties of 2 .17 to 3.93 (for 
covered pixels). Because the fragment access is linear, a graceful degradation by paging fraginents to system 
memory would provide ^ The hardware logic is straightforward and can be incorporated into 

the current Z compare and composite of a tracutioual architecture. A key invention is the modification of 
a recursive sofH^ hardware te^chni<jue. There are tradeoff in the 

amount df meanory used versus the r^ thatfthe ideal architecture will depend on 

the price point and available semiconductor technology. 
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